元梯度提供了一种一般方法,以优化增强学习算法(RL)算法的元参数。元梯度的估计对于这些元算法的性能至关重要,并且已经在MAML式短距离元元RL问题的情况下进行了研究。在这种情况下,先前的工作调查了对RL目标的Hessian的估计,并通过进行抽样校正来解决信贷分配问题,以解决预先适应行为。但是,我们表明,例如由DICE及其变体实施的Hessian估计始终会增加偏差,还可以为元梯度估计增加差异。同时,在重要的长马设置中,元梯度估计的研究较少,在这种情况下,通过完整的内部优化轨迹的反向传播是不可行的。我们研究了截短的反向传播和采样校正引起的偏见和差异权衡,并与进化策略进行了比较,这是最近流行的长期替代策略。虽然先前的工作隐含地选择了这个偏见变化空间中的点,但我们解散了偏见和差异的来源,并提出了将现有估计器相互关联的经验研究。
translated by 谷歌翻译
我们提供了一种新的单调改进保证,以优化合作多代理增强学习(MARL)中的分散政策,即使过渡动态是非平稳的。这项新分析提供了对两种最新的MARL参与者批评方法的强劲表现的理论理解,即独立的近端策略优化(IPPO)和多代理PPO(MAPPO)(MAPPO),它们都依赖于独立比率,即计算概率,每个代理商的政策分别比率。我们表明,尽管独立比率引起的非平稳性,但由于对所有分散政策的信任区域约束,仍会产生单调的改进保证。我们还可以根据培训中的代理数量来界定独立比率,从而以原则性的方式有效地执行这种信任区域约束,从而为近端剪辑提供了理论基础。此外,我们表明,当IPPO和Mappo中优化的替代目标在批评者收敛到固定点时实质上是等效的。最后,我们的经验结果支持以下假设:IPPO和MAPPO的强劲表现是通过削减集中式培训来执行这种信任区域约束的直接结果,而该执行的超参数的良好值对此对此具有高度敏感性正如我们的理论分析所预测的那样。
translated by 谷歌翻译
一致性是一元的学习算法,保证了在一定条件下,它可以在测试时间适应任何任务的理论性能。一个悬而未决的问题是,是否以及如何一致性理论转化为实践,在比较不一致的算法。在本文中,我们经验调查的一组代表性元RL算法这个问题。我们发现,在理论上是一致的算法的确可以通常适应外的分布(OOD)的任务,而那些不一致不能,虽然他们可以在实践中仍然无法像勘探不佳的原因。我们进一步发现,理论上不一致的算法可以由通过不断更新的OOD任务的所有剂成分一致,并适应以及或优于原先一致的。我们的结论是理论的一致性确实是一个理想的财产,且不一致元-RL算法可以很容易地做出一致的,享受同样的好处。
translated by 谷歌翻译
Biological systems and processes are networks of complex nonlinear regulatory interactions between nucleic acids, proteins, and metabolites. A natural way in which to represent these interaction networks is through the use of a graph. In this formulation, each node represents a nucleic acid, protein, or metabolite and edges represent intermolecular interactions (inhibition, regulation, promotion, coexpression, etc.). In this work, a novel algorithm for the discovery of latent graph structures given experimental data is presented.
translated by 谷歌翻译
The Government of Kerala had increased the frequency of supply of free food kits owing to the pandemic, however, these items were static and not indicative of the personal preferences of the consumers. This paper conducts a comparative analysis of various clustering techniques on a scaled-down version of a real-world dataset obtained through a conjoint analysis-based survey. Clustering carried out by centroid-based methods such as k means is analyzed and the results are plotted along with SVD, and finally, a conclusion is reached as to which among the two is better. Once the clusters have been formulated, commodities are also decided upon for each cluster. Also, clustering is further enhanced by reassignment, based on a specific cluster loss threshold. Thus, the most efficacious clustering technique for designing a food kit tailored to the needs of individuals is finally obtained.
translated by 谷歌翻译
Over the past decade, neural networks have been successful at making predictions from biological sequences, especially in the context of regulatory genomics. As in other fields of deep learning, tools have been devised to extract features such as sequence motifs that can explain the predictions made by a trained network. Here we intend to go beyond explainable machine learning and introduce SEISM, a selective inference procedure to test the association between these extracted features and the predicted phenotype. In particular, we discuss how training a one-layer convolutional network is formally equivalent to selecting motifs maximizing some association score. We adapt existing sampling-based selective inference procedures by quantizing this selection over an infinite set to a large but finite grid. Finally, we show that sampling under a specific choice of parameters is sufficient to characterize the composite null hypothesis typically used for selective inference-a result that goes well beyond our particular framework. We illustrate the behavior of our method in terms of calibration, power and speed and discuss its power/speed trade-off with a simpler data-split strategy. SEISM paves the way to an easier analysis of neural networks used in regulatory genomics, and to more powerful methods for genome wide association studies (GWAS).
translated by 谷歌翻译
Prior works on improving speech quality with visual input typically study each type of auditory distortion separately (e.g., separation, inpainting, video-to-speech) and present tailored algorithms. This paper proposes to unify these subjects and study Generalized Speech Enhancement, where the goal is not to reconstruct the exact reference clean signal, but to focus on improving certain aspects of speech. In particular, this paper concerns intelligibility, quality, and video synchronization. We cast the problem as audio-visual speech resynthesis, which is composed of two steps: pseudo audio-visual speech recognition (P-AVSR) and pseudo text-to-speech synthesis (P-TTS). P-AVSR and P-TTS are connected by discrete units derived from a self-supervised speech model. Moreover, we utilize self-supervised audio-visual speech model to initialize P-AVSR. The proposed model is coined ReVISE. ReVISE is the first high-quality model for in-the-wild video-to-speech synthesis and achieves superior performance on all LRS3 audio-visual enhancement tasks with a single model. To demonstrates its applicability in the real world, ReVISE is also evaluated on EasyCom, an audio-visual benchmark collected under challenging acoustic conditions with only 1.6 hours of training data. Similarly, ReVISE greatly suppresses noise and improves quality. Project page: https://wnhsu.github.io/ReVISE.
translated by 谷歌翻译
Due to the high activation sparsity and use of accumulates (AC) instead of expensive multiply-and-accumulates (MAC), neuromorphic spiking neural networks (SNNs) have emerged as a promising low-power alternative to traditional DNNs for several computer vision (CV) applications. However, most existing SNNs require multiple time steps for acceptable inference accuracy, hindering real-time deployment and increasing spiking activity and, consequently, energy consumption. Recent works proposed direct encoding that directly feeds the analog pixel values in the first layer of the SNN in order to significantly reduce the number of time steps. Although the overhead for the first layer MACs with direct encoding is negligible for deep SNNs and the CV processing is efficient using SNNs, the data transfer between the image sensors and the downstream processing costs significant bandwidth and may dominate the total energy. To mitigate this concern, we propose an in-sensor computing hardware-software co-design framework for SNNs targeting image recognition tasks. Our approach reduces the bandwidth between sensing and processing by 12-96x and the resulting total energy by 2.32x compared to traditional CV processing, with a 3.8% reduction in accuracy on ImageNet.
translated by 谷歌翻译
Language models (LMs) often generate incoherent outputs: they refer to events and entity states that are incompatible with the state of the world described in their inputs. We introduce SituationSupervision, a family of approaches for improving coherence in LMs by training them to construct and condition on explicit representations of entities and their states. SituationSupervision has two components: an auxiliary situation modeling task that trains models to predict state representations in context, and a latent state inference procedure that imputes these states from partially annotated training data. SituationSupervision can be applied to both fine-tuning (by supervising LMs to encode state variables in their hidden representations) and prompting (by inducing LMs to interleave textual descriptions of entity states with output text). In both cases, SituationSupervision requires only a small number of state annotations to produce major coherence improvements (between 4-11%), showing that standard LMs can be sample-efficiently trained to model not just language but the situations it describes.
translated by 谷歌翻译
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations. This form of "black-box" classifier training has become increasingly important as the cost of training and inference in large-scale LMs grows. But existing black-box LM classifier learning approaches are themselves computationally inefficient, typically specializing LMs to the target task by searching in a large space of (discrete or continuous) prompts using zeroth-order optimization methods. Instead of directly optimizing in prompt space, PromptBoosting obtains a small pool of prompts via a gradient-free approach and then constructs a large pool of weak learners by pairing these prompts with different elements of the LM's output distribution. These weak learners are then ensembled using the AdaBoost algorithm. The entire learning process requires only a small number of forward passes and no backward pass. Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
translated by 谷歌翻译